To practice SQL querying, I connect to a Google Big Query Database and do some routine querying tasks. We have a database named Test with 3 tables: Inventory, Products, and Sales. Inventory contains ProductId, StoreId, StoreName, Address, neighborhood, QuantityAvailable. Products contains ProductId, ProductName, Supplier, ProductCost. Sales Contains SalesId, StoreId, ProductId, UnitPrice, Quantity.
install.packages('bigrquery', repos = "http://cran.us.r-project.org")
## Installing package into 'C:/Users/Steve/AppData/Local/R/win-library/4.2'
## (as 'lib' is unspecified)
## package 'bigrquery' successfully unpacked and MD5 sums checked
## Warning: cannot remove prior installation of package 'bigrquery'
## Warning in file.copy(savedcopy, lib, recursive = TRUE):
## problem copying C:\Users\Steve\AppData\Local\R\win-
## library\4.2\00LOCK\bigrquery\libs\x64\bigrquery.dll to C:
## \Users\Steve\AppData\Local\R\win-library\4.2\bigrquery\libs\x64\bigrquery.dll:
## Permission denied
## Warning: restored 'bigrquery'
##
## The downloaded binary packages are in
## C:\Users\Steve\AppData\Local\Temp\RtmpMzGzRg\downloaded_packages
install.packages('dplyr', repos = "http://cran.us.r-project.org")
## Installing package into 'C:/Users/Steve/AppData/Local/R/win-library/4.2'
## (as 'lib' is unspecified)
## package 'dplyr' successfully unpacked and MD5 sums checked
## Warning: cannot remove prior installation of package 'dplyr'
## Warning in file.copy(savedcopy, lib, recursive = TRUE): problem copying C:
## \Users\Steve\AppData\Local\R\win-library\4.2\00LOCK\dplyr\libs\x64\dplyr.dll
## to C:\Users\Steve\AppData\Local\R\win-library\4.2\dplyr\libs\x64\dplyr.dll:
## Permission denied
## Warning: restored 'dplyr'
##
## The downloaded binary packages are in
## C:\Users\Steve\AppData\Local\Temp\RtmpMzGzRg\downloaded_packages
library(bigrquery)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
con <- dbConnect(
bigrquery::bigquery(),
project = "my-data-project-366100",
dataset = "Test",
billing = "my-data-project-366100"
)
dbListTables(con)
## ! Using an auto-discovered, cached token.
## To suppress this message, modify your code or options to clearly consent to
## the use of a cached token.
## See gargle's "Non-interactive auth" vignette for more details:
## <]8;;https://gargle.r-lib.org/articles/non-interactive-auth.htmlhttps://gargle.r-lib.org/articles/non-interactive-auth.html]8;;>
## ℹ The bigrquery package is using a cached token for ']8;;mailto:wssitu@gmail.comwssitu@gmail.com]8;;'.
## [1] "Inventory" "Products" "Sales"
We Create a SQL query that does a FULL OUTER join to create a single table
Select *
FROM Test.Inventory
FULL JOIN Test.Products
ON Inventory.ProductID = Products.ProductID
FULL JOIN Test.Sales
On Inventory.ProductID = Sales.ProductID
LIMIT 20
print(M1_results)
## # A tibble: 20 × 16
## ProductId StoreId StoreName Address neigh…¹ Quant…² Produ…³ Produ…⁴ Suppl…⁵
## <int> <int> <chr> <chr> <chr> <int> <int> <chr> <chr>
## 1 828 22623 Gordmans 0 Vermon… Hampden 4 828 Pepper… Gordma…
## 2 828 22623 Gordmans 0 Vermon… Hampden 4 828 Pepper… Gordma…
## 3 828 22623 Gordmans 0 Vermon… Hampden 4 828 Pepper… Gordma…
## 4 828 22623 Gordmans 0 Vermon… Hampden 4 828 Pepper… Gordma…
## 5 828 22623 Gordmans 0 Vermon… Hampden 4 828 Pepper… Gordma…
## 6 828 22623 Gordmans 0 Vermon… Hampden 4 828 Pepper… Gordma…
## 7 828 22623 Gordmans 0 Vermon… Hampden 4 828 Pepper… Gordma…
## 8 828 22623 Gordmans 0 Vermon… Hampden 4 828 Pepper… Gordma…
## 9 828 22623 Gordmans 0 Vermon… Hampden 4 828 Pepper… Gordma…
## 10 828 22623 Gordmans 0 Vermon… Hampden 4 828 Pepper… Gordma…
## 11 828 22623 Gordmans 0 Vermon… Hampden 4 828 Pepper… Gordma…
## 12 828 22623 Gordmans 0 Vermon… Hampden 4 828 Pepper… Gordma…
## 13 828 22623 Gordmans 0 Vermon… Hampden 4 828 Pepper… Gordma…
## 14 828 22623 Gordmans 0 Vermon… Hampden 4 828 Pepper… Gordma…
## 15 828 22623 Gordmans 0 Vermon… Hampden 4 828 Pepper… Gordma…
## 16 828 22623 Gordmans 0 Vermon… Hampden 4 828 Pepper… Gordma…
## 17 828 22623 Gordmans 0 Vermon… Hampden 4 828 Pepper… Gordma…
## 18 828 22623 Gordmans 0 Vermon… Hampden 4 828 Pepper… Gordma…
## 19 828 22623 Gordmans 0 Vermon… Hampden 4 828 Pepper… Gordma…
## 20 828 22623 Gordmans 0 Vermon… Hampden 4 828 Pepper… Gordma…
## # … with 7 more variables: ProductCost <dbl>, SalesId <int>, StoreId_1 <int>,
## # ProductId_2 <int>, Date <date>, UnitPrice <dbl>, Quantity <int>, and
## # abbreviated variable names ¹neighborhood, ²QuantityAvailable, ³ProductId_1,
## # ⁴ProductName, ⁵Supplier
Select *
FROM Test.Inventory
FULL JOIN Test.Products
ON Inventory.ProductID = Products.ProductID
FULL JOIN Test.Sales
On Inventory.ProductID = Sales.ProductID
LIMIT 20
print(M2_results)
## # A tibble: 20 × 16
## ProductId StoreId StoreName Address neigh…¹ Quant…² Produ…³ Produ…⁴ Suppl…⁵
## <int> <int> <chr> <chr> <chr> <int> <int> <chr> <chr>
## 1 828 22623 Gordmans 0 Vermon… Hampden 4 828 Pepper… Gordma…
## 2 828 22623 Gordmans 0 Vermon… Hampden 4 828 Pepper… Gordma…
## 3 828 22623 Gordmans 0 Vermon… Hampden 4 828 Pepper… Gordma…
## 4 828 22623 Gordmans 0 Vermon… Hampden 4 828 Pepper… Gordma…
## 5 828 22623 Gordmans 0 Vermon… Hampden 4 828 Pepper… Gordma…
## 6 828 22623 Gordmans 0 Vermon… Hampden 4 828 Pepper… Gordma…
## 7 828 22623 Gordmans 0 Vermon… Hampden 4 828 Pepper… Gordma…
## 8 828 22623 Gordmans 0 Vermon… Hampden 4 828 Pepper… Gordma…
## 9 828 22623 Gordmans 0 Vermon… Hampden 4 828 Pepper… Gordma…
## 10 828 22623 Gordmans 0 Vermon… Hampden 4 828 Pepper… Gordma…
## 11 828 22623 Gordmans 0 Vermon… Hampden 4 828 Pepper… Gordma…
## 12 828 22623 Gordmans 0 Vermon… Hampden 4 828 Pepper… Gordma…
## 13 828 22623 Gordmans 0 Vermon… Hampden 4 828 Pepper… Gordma…
## 14 828 22623 Gordmans 0 Vermon… Hampden 4 828 Pepper… Gordma…
## 15 828 22623 Gordmans 0 Vermon… Hampden 4 828 Pepper… Gordma…
## 16 828 22623 Gordmans 0 Vermon… Hampden 4 828 Pepper… Gordma…
## 17 828 22623 Gordmans 0 Vermon… Hampden 4 828 Pepper… Gordma…
## 18 828 22623 Gordmans 0 Vermon… Hampden 4 828 Pepper… Gordma…
## 19 828 22623 Gordmans 0 Vermon… Hampden 4 828 Pepper… Gordma…
## 20 828 22623 Gordmans 0 Vermon… Hampden 4 828 Pepper… Gordma…
## # … with 7 more variables: ProductCost <dbl>, SalesId <int>, StoreId_1 <int>,
## # ProductId_2 <int>, Date <date>, UnitPrice <dbl>, Quantity <int>, and
## # abbreviated variable names ¹neighborhood, ²QuantityAvailable, ³ProductId_1,
## # ⁴ProductName, ⁵Supplier
Select *
FROM (Select *
FROM Test.Inventory
FULL JOIN Test.Products
ON Inventory.ProductID = Products.ProductID
FULL JOIN Test.Sales
On Inventory.ProductID = Sales.ProductID)
WHERE StoreName IS NULL
LIMIT 20
print(M3_results)
## # A tibble: 0 × 16
## # … with 16 variables: ProductId <int64>, StoreId <int64>, StoreName <chr>,
## # Address <chr>, neighborhood <chr>, QuantityAvailable <int64>,
## # ProductId_1 <int64>, ProductName <chr>, Supplier <chr>, ProductCost <dbl>,
## # SalesId <int64>, StoreId_1 <int64>, ProductId_2 <int64>, Date <date>,
## # UnitPrice <dbl>, Quantity <int64>
Select COUNT(DISTINCT StoreName) AS distinct_count
FROM (Select *
FROM Test.Inventory
FULL JOIN Test.Products
ON Inventory.ProductID = Products.ProductID
FULL JOIN Test.Sales
On Inventory.ProductID = Sales.ProductID)
LIMIT 20
print(M4_results)
## # A tibble: 1 × 1
## distinct_count
## <int>
## 1 34
Select *
FROM (Select *
FROM Test.Inventory
FULL JOIN Test.Products
ON Inventory.ProductID = Products.ProductID
FULL JOIN Test.Sales
On Inventory.ProductID = Sales.ProductID)
WHERE StoreName = "Walmart"
LIMIT 20
print(M5_results)
## # A tibble: 20 × 16
## ProductId StoreId StoreName Address neigh…¹ Quant…² Produ…³ Produ…⁴ Suppl…⁵
## <int> <int> <chr> <chr> <chr> <int> <int> <chr> <chr>
## 1 29 21777 Walmart 2 Laurel… Sabina… 5 29 Soup -… Walmart
## 2 29 21777 Walmart 2 Laurel… Sabina… 5 29 Soup -… Walmart
## 3 29 21777 Walmart 2 Laurel… Sabina… 5 29 Soup -… Walmart
## 4 29 21777 Walmart 2 Laurel… Sabina… 5 29 Soup -… Walmart
## 5 29 21777 Walmart 2 Laurel… Sabina… 5 29 Soup -… Walmart
## 6 29 21777 Walmart 2 Laurel… Sabina… 5 29 Soup -… Walmart
## 7 29 21777 Walmart 2 Laurel… Sabina… 5 29 Soup -… Walmart
## 8 29 21777 Walmart 2 Laurel… Sabina… 5 29 Soup -… Walmart
## 9 29 21777 Walmart 2 Laurel… Sabina… 5 29 Soup -… Walmart
## 10 29 21777 Walmart 2 Laurel… Sabina… 5 29 Soup -… Walmart
## 11 29 21777 Walmart 2 Laurel… Sabina… 5 29 Soup -… Walmart
## 12 29 21777 Walmart 2 Laurel… Sabina… 5 29 Soup -… Walmart
## 13 29 21777 Walmart 2 Laurel… Sabina… 5 29 Soup -… Walmart
## 14 29 21777 Walmart 2 Laurel… Sabina… 5 29 Soup -… Walmart
## 15 29 21777 Walmart 2 Laurel… Sabina… 5 29 Soup -… Walmart
## 16 29 21777 Walmart 2 Laurel… Sabina… 5 29 Soup -… Walmart
## 17 29 21777 Walmart 2 Laurel… Sabina… 5 29 Soup -… Walmart
## 18 29 21777 Walmart 2 Laurel… Sabina… 5 29 Soup -… Walmart
## 19 29 21777 Walmart 2 Laurel… Sabina… 5 29 Soup -… Walmart
## 20 29 21777 Walmart 2 Laurel… Sabina… 5 29 Soup -… Walmart
## # … with 7 more variables: ProductCost <dbl>, SalesId <int>, StoreId_1 <int>,
## # ProductId_2 <int>, Date <date>, UnitPrice <dbl>, Quantity <int>, and
## # abbreviated variable names ¹neighborhood, ²QuantityAvailable, ³ProductId_1,
## # ⁴ProductName, ⁵Supplier
Select *
FROM (Select *
FROM Test.Inventory
FULL JOIN Test.Products
ON Inventory.ProductID = Products.ProductID
FULL JOIN Test.Sales
On Inventory.ProductID = Sales.ProductID)
WHERE QuantityAvailable >= 5
LIMIT 20
print(M6_results)
## # A tibble: 20 × 16
## ProductId StoreId StoreName Address neigh…¹ Quant…² Produ…³ Produ…⁴ Suppl…⁵
## <int> <int> <chr> <chr> <chr> <int> <int> <chr> <chr>
## 1 29 21777 Walmart 2 Laurel… Sabina… 5 29 Soup -… Walmart
## 2 29 21777 Walmart 2 Laurel… Sabina… 5 29 Soup -… Walmart
## 3 29 21777 Walmart 2 Laurel… Sabina… 5 29 Soup -… Walmart
## 4 29 21777 Walmart 2 Laurel… Sabina… 5 29 Soup -… Walmart
## 5 29 21777 Walmart 2 Laurel… Sabina… 5 29 Soup -… Walmart
## 6 29 21777 Walmart 2 Laurel… Sabina… 5 29 Soup -… Walmart
## 7 29 21777 Walmart 2 Laurel… Sabina… 5 29 Soup -… Walmart
## 8 29 21777 Walmart 2 Laurel… Sabina… 5 29 Soup -… Walmart
## 9 29 21777 Walmart 2 Laurel… Sabina… 5 29 Soup -… Walmart
## 10 29 21777 Walmart 2 Laurel… Sabina… 5 29 Soup -… Walmart
## 11 29 21777 Walmart 2 Laurel… Sabina… 5 29 Soup -… Walmart
## 12 29 21777 Walmart 2 Laurel… Sabina… 5 29 Soup -… Walmart
## 13 29 21777 Walmart 2 Laurel… Sabina… 5 29 Soup -… Walmart
## 14 29 21777 Walmart 2 Laurel… Sabina… 5 29 Soup -… Walmart
## 15 29 21777 Walmart 2 Laurel… Sabina… 5 29 Soup -… Walmart
## 16 29 21777 Walmart 2 Laurel… Sabina… 5 29 Soup -… Walmart
## 17 29 21777 Walmart 2 Laurel… Sabina… 5 29 Soup -… Walmart
## 18 29 21777 Walmart 2 Laurel… Sabina… 5 29 Soup -… Walmart
## 19 29 21777 Walmart 2 Laurel… Sabina… 5 29 Soup -… Walmart
## 20 29 21777 Walmart 2 Laurel… Sabina… 5 29 Soup -… Walmart
## # … with 7 more variables: ProductCost <dbl>, SalesId <int>, StoreId_1 <int>,
## # ProductId_2 <int>, Date <date>, UnitPrice <dbl>, Quantity <int>, and
## # abbreviated variable names ¹neighborhood, ²QuantityAvailable, ³ProductId_1,
## # ⁴ProductName, ⁵Supplier
Select UnitPrice, Quantity, UnitPrice * Quantity AS total_revenue
FROM (Select *
FROM Test.Inventory
FULL JOIN Test.Products
ON Inventory.ProductID = Products.ProductID
FULL JOIN Test.Sales
On Inventory.ProductID = Sales.ProductID)
LIMIT 20
M7_results
## # A tibble: 20 × 3
## UnitPrice Quantity total_revenue
## <dbl> <int> <dbl>
## 1 0.507 9 4.57
## 2 0.507 40 20.3
## 3 0.507 90 45.7
## 4 0.507 15 7.61
## 5 0.507 40 20.3
## 6 0.507 85 43.1
## 7 0.507 20 10.1
## 8 0.507 37 18.8
## 9 0.507 43 21.8
## 10 0.507 47 23.9
## 11 0.507 12 6.09
## 12 0.507 53 26.9
## 13 0.507 28 14.2
## 14 0.507 69 35.0
## 15 0.507 60 30.4
## 16 0.507 55 27.9
## 17 0.507 13 6.60
## 18 0.507 58 29.4
## 19 0.507 12 6.09
## 20 0.507 46 23.3
Select StoreName, ProductName, AVG(Quantity) AS average_quantity
FROM (Select *
FROM Test.Inventory
FULL JOIN Test.Products
ON Inventory.ProductID = Products.ProductID
FULL JOIN Test.Sales
On Inventory.ProductID = Sales.ProductID)
GROUP BY StoreName, ProductName
ORDER BY average_quantity DESC
LIMIT 20
M8_results
## # A tibble: 20 × 3
## StoreName ProductName average_quantity
## <chr> <chr> <dbl>
## 1 HomeGoods Five Alive Citrus 55.6
## 2 Family Dollar Toamtoes 6x7 Select 55.5
## 3 Renys Truffle Paste 55.3
## 4 HomeSense Kumquat 55.3
## 5 Family Dollar Cakes Assorted 55.2
## 6 Target Cheese - Cream Cheese 55.2
## 7 Tuesday Morning Bag Clear 10 Lb 55.1
## 8 T.J. Maxx Beef - Kobe Striploin 55.0
## 9 Ollie's Bargain Outlet Beets - Candy Cane, Organic 54.9
## 10 Ollie's Bargain Outlet Veal - Tenderloin, Untrimmed 54.7
## 11 Gabe's Pepper - Black, Ground 54.7
## 12 Gabe's Five Alive Citrus 54.7
## 13 Dd's Discounts Spice - Greek 1 Step 54.7
## 14 Big Lots Beer - Paulaner Hefeweisse 54.6
## 15 Meijer Sauce - White, Mix 54.6
## 16 Ross Stores Pop Shoppe Cream Soda 54.5
## 17 Ocean State Job Lot Lamb - Whole, Frozen 54.5
## 18 Renys Sauce - Salsa 54.5
## 19 Ross Stores Cleaner - Comet 54.4
## 20 Gabe's Sunflower Seed Raw 54.4
Select StoreName, ProductName, AVG(Quantity) AS average_quantity
FROM (Select *
FROM Test.Inventory
FULL JOIN Test.Products
ON Inventory.ProductID = Products.ProductID
FULL JOIN Test.Sales
On Inventory.ProductID = Sales.ProductID)
GROUP BY StoreName, ProductName
HAVING StoreName = "Walmart" OR StoreName = "Dollar General"
ORDER BY average_quantity DESC
LIMIT 20
print(M9_results)
## # A tibble: 20 × 3
## StoreName ProductName average_quantity
## <chr> <chr> <dbl>
## 1 Dollar General Country Roll 54.4
## 2 Dollar General Sunflower Seed Raw 54.1
## 3 Walmart Pepper - Sorrano 53.9
## 4 Dollar General Lamb - Loin, Trimmed, Boneless 53.9
## 5 Dollar General Fenngreek Seed 53.0
## 6 Walmart Beans - Turtle, Black, Dry 53.0
## 7 Dollar General Persimmons 52.9
## 8 Walmart Chivas Regal - 12 Year Old 52.9
## 9 Walmart Appetizer - Mini Egg Roll, Shrimp 52.8
## 10 Dollar General Potatoes - Mini Red 52.8
## 11 Dollar General Sole - Fillet 52.6
## 12 Walmart Longos - Cheese Tortellini 52.5
## 13 Walmart Ginger - Ground 52.1
## 14 Walmart Spice - Paprika 51.8
## 15 Dollar General Beef - Tongue, Fresh 51.8
## 16 Walmart Steamers White 51.5
## 17 Dollar General Sherry - Dry 51.2
## 18 Walmart Soup - Campbells, Creamy 51.1
## 19 Walmart Breakfast Quesadillas 51.1
## 20 Dollar General Chef Hat 25cm 51.1